Approximate Inference via Weighted Rademacher Complexity

نویسندگان

  • Jonathan Kuck
  • Ashish Sabharwal
  • Stefano Ermon
چکیده

Rademacher complexity is often used to characterize the learnability of a hypothesis class and is known to be related to the class size. We leverage this observation and introduce a new technique for estimating the size of an arbitrary weighted set, defined as the sum of weights of all elements in the set. Our technique provides upper and lower bounds on a novel generalization of Rademacher complexity to the weighted setting in terms of the weighted set size. This generalizes Massart’s Lemma, a known upper bound on the Rademacher complexity in terms of the unweighted set size. We show that the weighted Rademacher complexity can be estimated by solving a randomly perturbed optimization problem, allowing us to derive high-probability bounds on the size of any weighted set. We apply our method to the problems of calculating the partition function of an Ising model and computing propositional model counts (#SAT). Our experiments demonstrate that we can produce tighter bounds than competing methods in both the weighted and unweighted settings. Introduction A wide variety of problems can be reduced to computing the sum of (many) non-negative numbers. These include calculating the partition function of a graphical model, propositional model counting (#SAT), and calculating the permanent of a non-negative matrix. Equivalently, each can be viewed as computing the discrete integral of a non-negative weight function. Exact summation, however, is generally intractable due to the curse of dimensionality (Bellman 1961). As alternatives to exact computation, variational methods (Jordan et al. 1998; Wainwright, Jordan, and others 2008) and sampling (Jerrum and Sinclair 1996; Madras 2002) are popular approaches for approximate summation. However, they generally do not guarantee the estimate’s quality. An emerging line of work estimates and formally bounds propositional model counts or, more generally, discrete integrals (Ermon et al. 2013a; Chakraborty, Meel, and Vardi 2013; Ermon et al. 2014; Zhao et al. 2016). These approaches reduce the problem of integration to solving a small number of optimization problems involving the same weight function but subject to additional random constraints introduced by a random hash function. This results in approximating the Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. #P-hard problem of exact summation (Valiant 1979) using the solutions of NP-hard optimization problems. Optimization can be performed efficiently for certain classes of weight functions, such as those involved in the computation of the permanent of a non-negative matrix. If instead of summing (permanent computation) we maximize the same weight function, we obtain a maximum weight matching problem, which is in fact solvable in polynomial time (Kuhn 1955). However, adding hash-based constraints makes the maximum matching optimization problem intractable, which limits the application of randomized hashing approaches (Ermon et al. 2013c). On the other hand, there do exist fully polynomial-time randomized approximation schemes (FPRAS) for non-negative permanent computation (Jerrum, Sinclair, and Vigoda 2004; Bezáková et al. 2006). This gives hope that approximation schemes may exist for other counting problems even when optimization with hash-based constraints is intractable. We present a new method for approximating and bounding the size of a general weighted set (i.e., the sum of the weights of its elements) using geometric arguments based on the set’s shape. Our approach, rather than relying on hash-based techniques, establishes a novel connection with Rademacher complexity (Shalev-Shwartz and Ben-David 2014). This generalizes geometric approaches developed for the unweighted case to the weighted setting, such as the work of Barvinok (1997) who uses similar reasoning but without connecting it with Rademacher complexity. In particular, we first generalize Rademacher complexity to weighted sets. While Rademacher complexity is defined as the maximum of the sum of Rademacher variables over a set, weighted Rademacher complexity also accounts for the weight of each element in the set. Just like Rademacher complexity is related to the size of the set, we show that weighted Rademacher complexity is related to the total weight of the set. Further, it can be estimated by solving multiple instances of a maximum weight optimization problem, subject to random Rademacher perturbations. Notably, the resulting optimization problem turns out to be computationally much simpler than that required by the aforementioned randomized hashing schemes. In particular, if the weight function is log-supermodular, the corresponding weighted Rademacher complexity can be estimated efficiently, as our perturbation does not change the original optimization problem’s complexity (Orlin 2009; ar X iv :1 80 1. 09 02 8v 1 [ cs .L G ] 2 7 Ja n 20 18 Bach and others 2013). Our approach most closely resembles a recent line of work involving the Gumbel distribution (Hazan and Jaakkola 2012; Hazan, Maji, and Jaakkola 2013; Hazan et al. 2016; Balog et al. 2017; Mussmann and Ermon 2016; Mussmann, Levy, and Ermon 2017). There, the Gumbel-max idea is used to bound the partition function by performing MAP inference on a model where the unnormalized probability of each state is perturbed by random noise variables sampled from a Gumbel distribution. While very powerful, exact application of the Gumbel method is impractical, as it requires exponentially many independent random perturbations. One instead uses local approximations of the technique. Empirically, on spin glass models we show that our technique yields tighter upper bounds and similar lower bounds compared with the Gumbel method, given similar computational resources. On a suite of #SAT model counting instances our approach generally produces comparable or tighter upper and lower bounds given limited computation. Background Rademacher complexity is an important tool used in learning theory to bound the generalization error of a hypothesis class (Shalev-Shwartz and Ben-David 2014). Definition 1. The Rademacher complexity of a set A ⊆ R is defined as:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalization Bounds for Weighted Automata

This paper studies the problem of learning weighted automata from a finite labeled training sample. We consider several general families of weighted automata defined in terms of three different measures: the norm of an automaton’s weights, the norm of the function computed by an automaton, or the norm of the corresponding Hankel matrix. We present new data-dependent generalization guarantees fo...

متن کامل

Generalization bounds for learning weighted automata

This paper studies the problem of learning weighted automata from a finite sample of strings with real-valued labels. We consider several hypothesis classes of weighted automata defined in terms of three different measures: the norm of an automaton’s weights, the norm of the function computed by an automaton, and the norm of the corresponding Hankel matrix. We present new data-dependent general...

متن کامل

Rademacher Complexity and Grammar Induction Algorithms: What It May (Not) Tell Us

This paper revisits a problem of the evaluation of computational grammatical inference (GI) systems and discusses what role complexity measures can play for the assessment of GI. We provide a motivation for using the Rademacher complexity and give an example showing how this complexity measure can be used in practice.

متن کامل

On the Rademacher Complexity of Weighted Automata

Weighted automata (WFAs) provide a general framework for the representation of functions mapping strings to real numbers. They include as special instances deterministic finite automata (DFAs), hidden Markov models (HMMs), and predictive states representations (PSRs). In recent years, there has been a renewed interest in weighted automata in machine learning due to the development of efficient ...

متن کامل

The Distribution of Vector-valued Rademacher Series

Let X = P εnxn be a Rademacher series with vector-valued coefficients. We obtain an approximate formula for the distribution of the random variable ||X|| in terms of its mean and a certain quantity derived from the K-functional of interpolation theory. Several applications of the formula are given.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1801.09028  شماره 

صفحات  -

تاریخ انتشار 2017